Handling of analytic queries

ABSTRACT

Systems and methods for evaluating analytic queries comprising disjunctive Boolean expressions are described. A method may include receiving an analytic query comprising a first disjunctive Boolean expression. The method may further include transforming the analytic query to obtain a transformed analytic query comprising at least one nondisjunctive Boolean expression and at least a second disjunctive Boolean expression. The method may also include evaluating the transformed analytic query, wherein complete evaluation of the at least one nondisjunctive Boolean expressions and the at least a second disjunctive Boolean expressions yields the same results as evaluation of the first disjunctive Boolean expression.

FIELD OF THE DISCLOSURE

The instant disclosure relates generally to the handling of analyticqueries in database systems. More specifically, this disclosure relatesto improved evaluation of analytic queries that include disjunctiveBoolean expressions.

BACKGROUND

Analytic queries often contain disjunctive Boolean expressions, withina, JOIN or UNION operation, that combine records from two or more tablesin a database. Conventional database management systems typicallyevaluate an analytic query containing disjunctive Boolean expressions bydedicating one processor or software function for accessing the datatables (one table at a time) referenced in the query, filtered by anynondisjunctive Boolean expressions, and then transferring the resultantdata records to a second processor or software function for evaluatingthe disjunctive Boolean expressions on all the transferred data recordscontained in the referenced tables. More specifically, in conventionaldatabase management systems, if the analytic query, i.e., a databaseoperation command, includes disjunctive Boolean expressions thatreference multiple tables, such as if the analytic query includes a JOINpredicate, the referenced tables must all be passed from an initialstorage function that accesses the tables but that is only capable ofevaluating single-table expressions to a relational function capable ofevaluating relational expressions.

However, the performance of conventional systems suffers when many datatables need to be processed to evaluate the disjunctive Booleanexpressions. For example, all the data tables needed for evaluation ofthe disjunctive Boolean expressions need to be retrieved from a databasethat can include up to millions of data records spread across two ormore tables, with each table possibly including millions, or evenbillions, of records. In addition, all the accessed data tables thenneed to be transferred to another processor or software function toevaluate the disjunctive Boolean expressions in the analytic query.Consequently, many processing resources and lots of time may be consumedby conventional database systems to process analytic queries containingdisjunctive Boolean expressions. Although the deficiencies describedabove relate to relational databases, non-relational databases suffersimilar deficiencies.

SUMMARY

The performance of database systems evaluating analytic queries thatinclude disjunctive Boolean expressions may be improved by internallytransforming an analytic query so as to represent a disjunctive Booleanexpression as a set of nondisjunctive Boolean expressions, such as SQLpredicates referencing records of a single table, and another set ofdisjunctive Boolean expressions. By doing so, the nondisjunctive Booleanexpressions may be evaluated first on single data tables to filter outdata records that cannot meet the criteria of the original disjunctiveBoolean expression because they do not meet the criteria of thenondisjunctive Boolean expressions extracted from the originaldisjunctive Boolean expression. As a result, less data records may betransferred to the second processor or software function, which allowsthe final disjunctive Boolean expressions to be evaluated faster, andyields better performance by the computer system because less processingresources are required to execute the analytic query.

According to one embodiment, a method for evaluating analytic queriesthat include disjunctive Boolean expressions may include receiving, witha processor, an analytic query comprising a first disjunctive Booleanexpression. The method may also include transforming, with theprocessor, the analytic query to obtain a transformed analytic querycomprising at least one nondisjunctive Boolean expression and at least asecond disjunctive Boolean expression. The method may further includeevaluating, with the processor, the transformed analytic query, whereincomplete evaluation of the at least one nondisjunctive Booleanexpressions and the at least a second disjunctive Boolean expressionsyields the same results as evaluation of the first disjunctive Booleanexpression.

According to another embodiment, a computer program product may includea non-transitory computer-readable medium comprising code to perform thestep of receiving an analytic query comprising a first disjunctiveBoolean expression. The medium may also include code to perform the stepof transforming the analytic query to obtain a transformed analyticquery comprising at least one nondisjunctive Boolean expression and atleast a second disjunctive Boolean expression. The medium may furtherinclude code to perform the step of evaluating the transformed analyticquery, wherein complete evaluation of the at least one nondisjunctiveBoolean expressions and the at least a second disjunctive Booleanexpressions yields the same results as evaluation of the firstdisjunctive Boolean expression.

According to yet another embodiment, an apparatus may include a memoryand a processor coupled to the memory. The processor may be configuredto execute the step of receiving an analytic query comprising a firstdisjunctive Boolean expression. The processor may also be configured toperform the step of transforming the analytic query to obtain atransformed analytic query comprising at least one nondisjunctiveBoolean expression and at least a second disjunctive Boolean expression.The processor may be further configured to perform the step ofevaluating the transformed analytic query, wherein complete evaluationof the at least one nondisjunctive Boolean expressions and the at leasta second disjunctive Boolean expressions yields the same results asevaluation of the first disjunctive Boolean expression.

The foregoing has outlined rather broadly the features and technicaladvantages of the present invention in order that the detaileddescription of the invention that follows may be better understood.Additional features and advantages of the invention will be describedhereinafter that form the subject of the claims of the invention. Itshould be appreciated by those skilled in the art that the concepts andspecific embodiments disclosed may be readily utilized as a basis formodifying or designing other structures for carrying out the samepurposes of the present invention. It should also be realized by thoseskilled in the art that such equivalent constructions do not depart fromthe spirit and scope of the invention as set forth in the appendedclaims. The novel features that are believed to be characteristic of theinvention, both as to its organization and method of operation, togetherwith further objects and advantages will be better understood from thefollowing description when considered in connection with theaccompanying figures. It is to be expressly understood, however, thateach of the figures is provided for the purpose of illustration anddescription only and is not intended as a definition of the limits ofthe present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the disclosed systems and methods,reference is now made to the following descriptions taken in conjunctionwith the accompanying drawings.

FIG. 1 is a schematic block diagram illustrating a first embodiment of acomputing system including a database management system according to oneembodiment of the disclosure.

FIG. 2 is a schematic block diagram illustrating a second embodiment ofa computing system including a database management system according toone embodiment of the disclosure.

FIG. 3 is a schematic view illustrating a user application interface toa database management system and the handling of analytic queries at thedatabase management system according to one embodiment of thedisclosure.

FIG. 4 is a schematic block diagram illustrating a user applicationinterface to a database management system and components of an accessdatabase of the database management system that handle execution ofanalytic queries according to one embodiment of the disclosure.

FIG. 5 is a flow chart illustrating a method for evaluating analyticqueries including disjunctive Boolean expressions according to oneembodiment of the disclosure.

FIGS. 6A-6B provide source code illustrating the transformation of ananalytic query with disjunctive Boolean expressions for optimizedexecution by a database management system according to one embodiment ofthe disclosure.

FIG. 7 is a schematic block diagram illustrating an electronic computingdevice with which aspects of the present disclosure can be implementedaccording to one embodiment of the disclosure.

DETAILED DESCRIPTION

The performance of database management systems handling analytic queriesthat include disjunctive Boolean expressions may be improved bytransforming the analytic queries to include nondisjunctive anddisjunctive Boolean expressions. A transformed analytic query may beevaluated by first performing single-table evaluations of thenondisjunctive Boolean expressions with a storage engine of a databasemanagement system, and filtering out the records that do not meet thecriteria of this first evaluation. Subsequently, only the subset oftables that met the criteria of the first evaluation, i.e., met thecriteria of the nondisjunctive Boolean expression, may be transferred tothe relational engine to perform the multi-table relational evaluationsof the disjunctive Boolean expressions of the transformed analyticquery. The transforming of the analytic query to allow more single-tableevaluations to be performed by the storage engine allows the storageengine to serve as an initial filter of tables. By filtering the tables,less records may be transferred from the storage engine to therelational engine, which may lead to substantial performance and timeconsumption reductions because the transfer and processing of datarecords, by the relational engine, tend to be a performance- andtime-intensive tasks.

Database management systems allow for the management of data in aparticular database, and for managing access to that data by manydifferent user programs. The user programs may be written in a highlevel language, such as C, C++, Java, or some other analogous language.In some embodiments, the user program may perform a call to the databasemanagement system when a database operation is to be performed.

FIG. 1 illustrates a system 100 for evaluating analytic queriescomprising disjunctive Boolean expressions according to one embodimentof the disclosure. When a database management system receives a requestto perform a database operation, it may handle the request as depictedin FIG. 1. As illustrated in FIG. 1, a system 100 may include aplurality of user applications 102 a-n, from one or more of which aquery statement may be received by a database management system 104. Ifthe database management system 104 is a relational database managementsystem the query statement can be, for example, a SQL statement. Thedatabase management system 104 may analyze the query statement todetermine if there are any errors in the statement itself. Assuming thatthere are no errors, the database management system 104 may transmit acall to an operating system 106 hosting that system (shown asCREAT$IOGATE), requesting access to the database file 108. The operatingsystem 106 may then assign the database file to the database managementsystem 104. The database management system 104 then caches at leastportions of the database file using subsequent system calls (e.g.,UDS$IOW). This requested data may be modified by the database managementsystem 104, and control may be returned to the user program forcontinued execution.

FIG. 2 illustrates a second embodiment of a computing system 200including a database management system. As shown, the system 200 mayinclude a plurality of user applications 202 a-n communicatively coupledto a database management system 204. The user applications 202 a-n canbe located, for example, on the same computing device as the databasemanagement system 204, or on a different computing device or devicesthat are communicatively connected thereto. Examples of computingdevices useable to execute computing instructions constituting the userapplications 202 a-n and the database management system 204 arediscussed below in connection with FIG. 7.

In the embodiment shown in FIG. 2, the database management system 204may be a relational database management system configured to receive andprocess SQL commands to perform operations on database files. Asillustrated herein, the database management system 204 may be hosted byan operating system 206, which provides access to one or more databasefiles 208. The database management system 204 can be, for example, apossible embodiment of a database management program 104 of FIG. 1.

The operating system 206 can be any of a variety of operating systemscapable of hosting a database management system 204 and which providesaccess controls to data stored in database files 208 on the computingsystem. In one example embodiment, the operating system 206 can be theOS2200 operating system, from Unisys Corporation of Blue Bell, Pa. Inalternative embodiments, other operating systems could be used as well.

In the embodiment shown in FIG. 2, the database management system 204includes a data expanse viewer 210. The data expanse viewer managesaccess requests for data stored in various data expanses that aremanaged by the operating system 206. Cooperatively, the operating systemincludes a data expanse file control component 212 and a data expanseview control component 214. In some embodiments, the data expanse viewer210 can transmit a request to the operating system 206, in the form of acall to the data expanse file control component 212 to create a dataexpanse defined by the attributes of a database file (shown as the callCREAT$DATAXP). This call may tell the operating system 206 to create acache 209 for the data expanse and initialize all necessary controlstructures and bit maps to page the cache. Once the data expanse isinitialized, the database management system 204 may be returned theaddress call-back into the operating system to access the data expanse.The data expanse file control component 212 can then load a data file208 into a cache 209 in the operating system, and return to the dataexpanse viewer a public starting address of the data expanse that iscreated.

Following creation of the data expanse, the data expanse viewer 210 canbe used by any of the user applications 202 a-n to access data in thedata expanse. For example, each of the user applications can request adifferent view of the data expanse, defined by a starting address (i.e.,an offset from the starting address returned to the data expanse viewer210, or an absolute address), as well as a size of the view to becreated. This can be accomplished, for example as shown, using theoperating system call DX$VIEW, providing the starting address and sizeof the view to be created. The data expanse view control component 214can then create a view of the data expanse, and provide to the databasemanagement system 204 a private address of the view to be created. Forexample, the address can be an extended virtual mode address, includinga length and a bank descriptor index to reference the location of theview.

In some embodiments, the data expanse may be referenced by its startingaddress or by an address of a particular segment, therefore the dataexpanse viewer 210 and other portions of the database management systemmay not need to be capable of addressing the entire range of the dataexpanse created. Rather, the particular view created within the dataexpanse may be fully addressable by the database management system 204.In this way, a database management system, such as system 204, can bemade compatible with data files having sizes greater than a maximumaddressable range of the database management system. For example, insome embodiments of the present disclosure, the database managementsystem 204 may be capable of individually addressing addresses in a bankof 262 k words; in such an arrangement, the database file 208 can have asize in excess of that number, since the address identified by thedatabase management system to the operating system 206 might identify anoffset on a bank-by-bank basis, rather than on an individual,word-addressable basis.

Furthermore, using the system 200 as illustrated, a bank may be madeaccessible to the database management system 204 without requiring thatthe database management system cache each database file; this allows theoperating system 206 to maintain cache management, which can beperformed more efficiently when managed at the operating system level.Still further, and in contrast to locking a bank (and correspondingdatabase file) to a particular application, the data expanse arrangementof FIG. 2 may allow a database file to be individually accessed by manydifferent user applications 202 a-n concurrently. By providing a privateview into a publicly accessible database file the database managementsystem 204 and operating system 206 may alleviate many possible dataconflicts.

FIG. 3 illustrates a user application interface to a database managementsystem and the handling of analytic queries at the database managementsystem according to one embodiment of the disclosure. In the embodimentshown, a user application 302 includes code, which can be written, forexample, in C (as shown), or any of a variety of other types ofprogramming languages, such as C++, Java, or other languages. The userapplication 302 may include a call to a database management program 304within the code of the application, to effect access to a database,e.g., to read or edit data in the database managed by the databasemanagement program 304. In the example embodiment shown, the userapplication 302 includes a line of code in the C programming languageindicated as “EXEC SQL . . . ” which may represent a form of a line ofcode useable to call the database management program 304. In variousembodiments, the user application 302 can be any of a variety of userapplications useable to access and/or modify data in a database.

The database management program 304 can be any program implementing adatabase management system that is hosted by an underlying operatingsystem. In an example embodiment, the database management program maycorrespond to a relational database management system, such as therelational database management system (RDMS) available from UnisysCorporation of Blue Bell, Pa. In alternative embodiments, other types ofdatabase management systems, and other arrangements of databases, couldbe used as well.

In the embodiment shown, the database management program 304 may includea syntax analyzer component 306, an access component 308, and an errorstatus component 310. The components 306-310 may be used to access andvalidate access requests to data on behalf of the user application 302by the database management program 304. The syntax analyzer component306 may receive a database command from the user application 302, suchas a SQL command, or other database command that will allow theapplication program to select, delete, insert, and update data in adatabase. In some embodiments, the syntax analyzer component 306 maydetermine an operation to be performed based on parsing of the receivedcommand.

If no errors are detected in the command, the access component 308 mayinterface with an underlying operating system to access a filecontaining data associated with the database accessed by the userapplication 302 for parsing and use by the database management program304. The access component 308 can then execute the database command asdefined in the received command. In such a case, an error statuscomponent 310 may be set to indicate that no error has occurred incompletion of the database operation. However, if an error is detectedin the syntax analyzer component 306, or during performance of thecommand by the access component 308, error status component 310 mayindicate that an error exists in the received database command.Accordingly, either a confirmation of the properly executed databasecommand, or an indication of an error, can be returned from the errorstatus component 310 to the user application 302.

In some embodiments, beyond passage of particular database commands fromthe user application 302, it is also possible for the databasemanagement program 304 to allow use of placeholder variables in acommand string, and therefore transfer values from program variables inthe C code of the program to the database, or from the database toprogram variables in the C code, thereby integrating the database as amechanism for storage of large program constructs, such as variables oflarge size, variable classes, and other data structures used asvariables and for which storage in a database is convenient.

FIG. 4 illustrates a user application interface to a database managementsystem and components of an access database of the database managementsystem that handle execution of analytic queries. In the embodimentshown in FIG. 4, the database management program 404 may include asyntax analyzer component 406 and an access component 408. The accesscomponent may include a relational engine 410 and a storage engine 412.In some embodiments, the relational engine 410 and the storage engine412 may execute on different processors. In another embodiment, therelational engine 410 and the storage engine 412 may execute asdifferent functions on the same processor on which the access component408 executes. In one embodiment, the relational engine 410 may be anAbstract Machine (AB) and the storage engine 412 may be a RelationalStorage Manager (RSM). Alternatively, other types of database managementsystems, and other arrangements of databases, could be used as well.

In some embodiments, the syntax analyzer component 406 may determine anoperation to be performed based on parsing of a command received from auser application 402, which may function similar to user application302. According to an embodiment, if the syntax analyzer component 406detects no error in the received database command, the database commandmay be executed by the access database component 408. The accessdatabase component 408 may utilize a relational engine 410 and a storageengine 412 to perform the database operations specified in the receiveddatabase command. According to an embodiment, the relational engine 410may be utilized to implement relational algorithms, such as, forexample, SQL JOIN, UNION, or UNION ALL, that combine records from two ormore tables in the database, such as database 108 or 208. The storageengine 412 may be utilized to execute SQL predicates that referencerecords of a single table. To execute database operations, therelational engine 410 may call the storage engine 412 to retrieverecords from a single table that match certain criteria, such ascriteria specified in a database operation command received from theuser application 402.

In view of exemplary systems shown and described herein, methodologiesthat may be implemented in accordance with the disclosed subject matterwill be better appreciated with reference to various functional blockdiagrams. While, for purposes of simplicity of explanation,methodologies are shown and described as a series of acts/blocks, it isto be understood and appreciated that the claimed subject matter is notlimited by the number or order of blocks, as some blocks may occur indifferent orders and/or at substantially the same time with other blocksfrom what is depicted and described herein. Moreover, not allillustrated blocks may be required to implement methodologies describedherein. It is to be appreciated that functionality associated withblocks may be implemented by software, hardware, a combination thereofor any other suitable means (e.g. device, system, process, orcomponent). Additionally, it should be further appreciated thatmethodologies disclosed throughout this specification are capable ofbeing stored on an article of manufacture to facilitate transporting andtransferring such methodologies to various devices. Those skilled in theart will understand and appreciate that a methodology couldalternatively be represented as a series of interrelated states orevents, such as in a state diagram.

FIG. 5 illustrates a method 500 for evaluating analytic queries thatinclude disjunctive Boolean expressions. Embodiments of method 500 maybe implemented with the systems described above with respect to FIGS.1-4. Specifically, method 500 includes, at block 502, receiving ananalytic query that includes a first disjunctive Boolean expression. Insome embodiments, a processor on which a database management system,such as database management systems 104, 204, 304, or 404, executes mayreceive the analytic query. According to an embodiment, the receivedanalytic query may specify a plurality of data tables, each of which mayinclude numerous data records, to be evaluated in accordance with thefirst disjunctive Boolean expression. For example, the first disjunctiveBoolean expression may require the processing of at least two datatables of the plurality of data tables to evaluate the first disjunctiveBoolean expression.

According to an embodiment, the processor may be configured to, uponreceiving the analytic query, retrieve the plurality of data tablesreferenced by the first disjunctive Boolean expression of the analyticquery. For example, the data tables may be retrieved from a database,such as database 108 or 208.

At block 504, method 500 includes transforming the analytic query toobtain a transformed analytic query comprising at least onenondisjunctive Boolean expression and at least a second disjunctiveBoolean expression. That is, the analytic query may be transformed suchthat the first disjunctive Boolean expression is represented as acombination of nondisjunctive, i.e., non-relational Boolean expressions,and disjunctive Boolean expressions, i.e., relational Booleanexpressions. In some embodiments, the transformation may be performed byan analytic query syntax analyzer function of a database managementsystem, such as syntax analyzer component 306 or 406. In addition, insome embodiments, a nondisjunctive Boolean expression may refer to a SQLpredicate that does not reference multiple data tables, thereby makingthe nondisjunctive Boolean expression a non-relational Booleanexpression. Therefore, in some embodiments, a nondisjunctive Booleanexpression may be a Boolean expression that may be evaluated using onlysingle-table processing. As mentioned, this is in contrast to adisjunctive Boolean expression, which references multiple tables of adatabase, thereby making it a multi-table relational Boolean expression.

Method 500 further includes, at block 506, evaluating the transformedanalytic query. In some embodiments, complete evaluation of thecombination of nondisjunctive and disjunctive Boolean expressions thatmake up the transformed analytic query may yield the same results as theevaluation of the first disjunctive Boolean expression.

According to an embodiment, evaluation of the transformed analyticquery, such as at block 506, may include evaluating each of the at leastone nondisjunctive Boolean expressions. In some embodiments, evaluationof a nondisjunctive Boolean expression may require the processing ofonly a single data table of the plurality of data tables. For example,upon retrieving a single data table referenced by the originaldisjunctive Boolean expression of the original received analytic query,the single data table may be processed to completely evaluate one of theat least one nondisjunctive Boolean expressions. Because thenondisjunctive Boolean expression is a Boolean expression derived fromthe original disjunctive Boolean expression of the analytic query, if adata table does not satisfy the criteria of the nondisjunctive Booleanexpression, then the data table may be disregarded such that no furtherprocessing is performed on the data table to evaluate the originaldisjunctive Boolean expression of the analytic query.

Evaluation of the transformed analytic query, such as at block 506, mayalso include filtering the plurality of data tables to obtain a subsetof the plurality of data tables. According to one embodiment, the subsetof data tables may be data tables processed for the evaluation of eachof the at least one nondisjunctive Boolean expressions and for which theevaluation of the data tables yielded a first Boolean value. Forexample, each of the data tables referenced by the original disjunctiveBoolean expression of the original received analytic query may beprocessed to evaluate whether they yield a first Boolean value, such asa Boolean value of TRUE indicating that the data table meets all of thecriteria of the nondisjunctive Boolean expression. Data tables yieldingthe first Boolean value, that is, a value of TRUE, may be grouped tocreate the subset of data tables to be further processed. The datatables not yielding the first Boolean value, such as those yielding aBoolean value of FALSE indicating that the data tables do not meet allthe criteria of the nondisjunctive Boolean expression, may bedisregarded, and thereby filtered out from any further processing.

In some embodiments, evaluation of the transformed analytic query, suchas at block 506, may further include evaluating each of the at least asecond disjunctive Boolean expressions. In some embodiments, evaluationof each of the at least a second disjunctive Boolean expressions mayrequire the processing of at least two data tables from the subset ofdata tables. In other words, in some embodiments, only the data tablesthat yielded a Boolean value of TRUE when evaluated with thenondisjunctive Boolean expressions may be evaluated with the subset ofdisjunctive Boolean expressions derived from the original disjunctiveBoolean expression of the analytic query. According to an embodiment,completion of the evaluating of each of the at least a seconddisjunctive Boolean expressions may yield the same result as theevaluation of the first disjunctive Boolean expression, except that lessprocessing resources may have been used and less time may have beenconsumed to arrive at the same result.

In essence, by evaluating the data tables with the nondisjunctiveBoolean expressions and filtering out the data tables that do not meetthe criteria of the nondisjunctive Boolean expressions, less data tablesmay require evaluation with disjunctive Boolean expressions than if theanalytic query had not been transformed, such as at block 504, becausewithout the transformation all the data tables would need to have beenevaluated with the original disjunctive Boolean expressions. Becauseevaluating data tables with disjunctive Boolean expressions instead ofnondisjunctive Boolean expressions requires more processing power andtime, evaluating data tables in accordance with method 500 may beadvantageous to conventional systems for evaluating analytic queriescontaining disjunctive Boolean expressions.

In some embodiments, processing of the data tables to evaluate thenondisjunctive Boolean expressions and to evaluate the disjunctiveBoolean expressions may be performed with a single processor. Forexample, the evaluation of the nondisjunctive and disjunctive Booleanexpressions may be performed with a processor on which a databasemanagement system executes. In such embodiments, a database managementsystem may utilize a relational engine function, such as relationalengine 410, and a storage engine function, such as storage engine 412,to completely evaluate the first disjunctive Boolean expression of theoriginal analytic query or the combination of nondisjunctive anddisjunctive Boolean expressions that make up the transformed analyticquery.

In other embodiments, evaluating each of the at least one nondisjunctiveBoolean expressions may include evaluating with a first subprocessor ofa processor, and evaluating each of the at least a second disjunctiveBoolean expressions may include evaluating with a second subprocessor ofthe processor. In other words, a first processor may be used to processdata tables a first time to evaluate the nondisjunctive Booleanexpressions, and a second processor may be used to process a subset ofthe data tables a second time to evaluate the disjunctive Booleanexpressions, such as at block 410. As an example, a database managementsystem may include multiple processors. The storage engine may executeon one processor to evaluate each of the at least one nondisjunctiveBoolean expressions, and the relational engine may execute on anotherprocessor to evaluate each of the at least a second disjunctive Booleanexpressions. In some embodiments, the database management system thatincludes the relational engine and the storage engine may execute on oneof the processors that executes either the storage engine or therelational engine. In other embodiments, the database management systemmay execute on a processor part of a larger server system that includesthe relational engine processor and the storage engine processor.

In embodiments in which separate processors are used for the storageengine and the relational engine, only the subset of tables may betransferred from the first subprocessor, i.e., the storage engineprocessor, to the second subprocessor, i.e., the relational engineprocessor. Transferring only the subset of data records may preventtransfer of data records that are sure to not meet the criteria of theoriginal disjunctive Boolean expression because if they do not meet thecriteria of the nondisjunctive Boolean expressions, which are derivedfrom the original first disjunctive Boolean expression, the data tablesmay not meet the criteria of the original first disjunctive Booleanexpression. By not transferring the data records sure to not meet thecriteria of the original disjunctive Boolean expressions of the analyticquery, the processing and time intensive task of data transfer may bereduced, thereby further improving processor performance and timeconsumption.

The schematic flow chart diagram of FIG. 5 is generally set forth as alogical flow chart diagram. As such, the depicted order and labeledsteps are indicative of one aspect of the disclosed method. Other stepsand methods may be conceived that are equivalent in function, logic, oreffect to one or more steps, or portions thereof, of the illustratedmethod. Additionally, the format and symbols employed are provided toexplain the logical steps of the method and are understood not to limitthe scope of the method. Although various arrow types and line types maybe employed in the flow chart diagram, they are understood not to limitthe scope of the corresponding method. Indeed, some arrows or otherconnectors may be used to indicate only the logical flow of the method.For instance, an arrow may indicate a waiting or monitoring period ofunspecified duration between enumerated steps of the depicted method.Additionally, the order in which a particular method occurs may or maynot strictly adhere to the order of the corresponding steps shown.

FIGS. 6A-6B provide source code illustrating the transformation of ananalytic query with disjunctive Boolean expressions for optimizedexecution by a database management system according to one embodiment ofthe disclosure. In particular, FIG. 6A illustrates source code for areceived SQL analytic TPCH query (query 19). FIG. 6B illustrates theresulting source code after the received SQL analytic TPCH query hasbeen transformed, such as by the method of FIG. 5, and includes commentsdescribing features of the resultant source code. Both queries, i.e.,the original received analytic query and the transformed analytic query,may have the same access path generated by an RDMS optimizer, such assyntax analyzer component 306 or 406. The access path for the originalreceived analytic query and the transformed analytic query are providedat the end of the source code illustrated in FIG. 6A and FIG. 6B,respectively.

Although FIGS. 5 and 6 describe embodiments in which the analytic queryis a SQL database operation command, the embodiments of this disclosureare not limited to SQL. For example, the embodiments of this disclosuremay also be applicable to the Hadoop NoSQL language Pig. In general, theembodiments of this disclosure may be applicable to various databasecommand languages so long as they perform the functions as specified inthe appended claims.

FIG. 7 illustrates an electronic computing device with which aspects ofthe present disclosure can be implemented according to one embodiment ofthe disclosure. In particular, the computing device 700 can represent anative computing device, such as a computing system on which any of avariety of the systems of FIGS. 1-4 can be implemented.

In the embodiment of FIG. 7, the computing, device 700 may include amemory 702, a processing system 704, a secondary storage device 706, anetwork interface card 708, a video interface 710, a display unit 712,an external component interface 714, and a communication medium 716. Thememory 702 may include one or more computer storage media capable ofstoring data and/or instructions. In different embodiments, the memory702 may be implemented in different ways. For example, the memory 702can be implemented using various types of computer storage media.

The processing system 704 may include one or more processing units. Aprocessing unit is a physical device or article of manufacturecomprising one or more integrated circuits that selectively executesoftware instructions. The processing system 704 can be implemented invarious ways. For example, the processing system 704 can be implementedas one or more processing cores. In another example, the processingsystem 704 can include one or more separate microprocessors. In yetanother example embodiment, the processing system 704 can include anapplication-specific integrated circuit (ASIC) that provides specificfunctionality. In yet another example, the processing system 704 mayprovide specific functionality by using an ASIC and by executingcomputer-executable instructions.

The secondary storage device 706 may include one or more computerstorage media. The secondary storage device 706 may store data andsoftware instructions not directly accessible by the processing system704. In other words, the processing system 704 may perform an I/Ooperation to retrieve data and/or software instructions from thesecondary storage device 706. In various embodiments, the secondarystorage device 706 includes various types of computer storage media. Forexample, the secondary storage device 706 can include one or moremagnetic disks, magnetic tape drives, optical discs, solid state memorydevices, and/or other types of computer storage media.

The network interface card 708 may allow the computing device 700 tosend data to and receive data from a communication network. In differentembodiments, the network interface card 708 may be implemented indifferent ways. For example, the network interface card 708 can beimplemented as an Ethernet interface, a token-ring network interface, afiber optic network interface, a wireless network interface (e.g.,Wi-Fi, WiMax, etc.), or another type of network interface.

The video interface 710 may allow the computing device 700 to outputvideo information to the display unit 712. The display unit 712 can bevarious types of devices for displaying video information, such as acathode-ray tube display, an LCD display panel, a plasma screen displaypanel, a touch-sensitive display panel, an LED screen, or a projector.The video interface 710 can communicate with the display unit 712 invarious ways, such as via a Universal Serial Bus (USB) connector, a VGAconnector, a digital visual interface (DVI) connector, an S-Videoconnector, a High-Definition Multimedia Interface (HDMI) interface, or aDisplayPort connector.

The external component interface 714 may allow the computing device 700to communicate with external devices. For example, the externalcomponent interface 714 can be a USB interface, a FireWire interface, aserial port interface, a parallel port interface, a PS/2 interface,and/or another type of interface that enables the computing device 700to communicate with external devices. In various embodiments, theexternal component interface 714 may allow the computing device 700 tocommunicate with various external components, such as external storagedevices, input devices, speakers, moderns, media player docks, othercomputing devices, scanners, digital cameras, and fingerprint readers.

The communications medium 716 may facilitate communication among thehardware components of the computing device 700. In the example of FIG.7, the communications medium 716 may facilitate communication among thememory 702, the processing system 704, the secondary storage device 706,the network interface card 708, the video interface 710, and theexternal component interface 714. The communications medium 716 can beimplemented in various ways. For example, the communications medium 716can include a PCI bus, a PCI Express bus, an accelerated graphics port(AGP) bus, a serial Advanced Technology Attachment (ATA) interconnect, aparallel ATA interconnect, a Fiber Channel interconnect, a USB bus, aSmall Computing system Interface (SCSI) interface, or another type ofcommunications medium.

The memory 702 may store various types of data and/or softwareinstructions. For instance, the memory 702 may store a BasicInput/Output System (BIOS) 718 and an operating system 720. The BIOS 718may include a set of computer-executable instructions that, whenexecuted by the processing system 704, cause the computing device 700 toboot up. The operating system 720 may include a set ofcomputer-executable instructions that, when executed by the processingsystem 704, cause the computing device 700 to provide an operatingsystem that coordinates the activities and sharing of resources of thecomputing device 700. Furthermore, the memory 702 may store applicationsoftware 722. The application software 722 may includecomputer-executable instructions, that when executed by the processingsystem 704, cause the computing device 700 to provide one or moreapplications. The memory 702 also stores program data 724. The programdata 724 may be data used by programs that execute on the computingdevice 700.

Although particular features are discussed herein as included within anelectronic computing device 700, it is recognized that in certainembodiments not all such components or features may be included within acomputing device executing according to the methods and systems of thepresent disclosure. Furthermore, different types of hardware and/orsoftware systems could be incorporated into such an electronic computingdevice.

Those of skill would appreciate that the various illustrative logicalblocks, modules, circuits, and algorithm steps described in connectionwith the disclosure herein may be implemented as electronic hardware,computer software stored on a computing device and executed by one ormore processing devices, or combinations of both. To clearly illustratethis interchangeability of hardware and software, various illustrativecomponents, blocks, modules, circuits, and steps have been describedabove generally in terms of their functionality. Whether suchfunctionality is implemented as hardware or software depends upon theparticular application and design constraints imposed on the overallsystem. Skilled artisans may implement the described functionality invarying ways for each particular application, but such implementationdecisions should not be interpreted as causing a departure from thescope of the disclosure.

In some embodiments, the techniques or steps of a method described inconnection with the aspects disclosed herein may be embodied directly inhardware, in software executed by a processor, or in a combination ofthe two. In some embodiments, a processor, as referred to herein, may bea single processor or a processor comprised of multiple processors,which can be referred to as subprocessors of the processor. In addition,different processors may be designed differently and may performdifferent functions, but variations in the design or functionality ofthe processor should not be interpreted as deviations from theprocessors of this disclosure so long as the processors are capable ofsubstantially performing the functions of the processors describedherein. In some aspects of the disclosure, any software module, softwarelayer, or thread described herein may comprise an engine comprisingfirmware or software and hardware configured to perform aspects of thedescribed herein. In general, functions of a software module or softwarelayer described herein may be embodied directly in hardware, or embodiedas software executed by a processor, or embodied as a combination of thetwo. A software module may reside in RAM memory, flash memory, ROMmemory, EPROM memory, EEPROM memory, registers, hard disk, a removabledisk, a CD-ROM, or any other form of storage medium known in the art. Anexemplary storage medium may be coupled to the processor such that theprocessor can read data from, and write data to, the storage medium. Inthe alternative, the storage medium may be integral to the processor.The processor and the storage medium may reside in an ASIC. The ASIC mayreside in a user device. In the alternative, the processor and thestorage medium may reside as discrete components in a user device.

If implemented in firmware and/or software, the functions describedabove may be stored as one or more instructions or code on acomputer-readable medium. Examples include non-transitorycomputer-readable media encoded with a data structure andcomputer-readable media encoded with a computer program.Computer-readable media includes physical computer storage media. Astorage medium may be any available medium that can be accessed by acomputer. By way of example, and not limitation, such computer-readablemedia can comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage or other magnetic storage devices, or anyother medium that can be used to store desired program code in the formof instructions or data structures and that can be accessed by acomputer. Disk and disc includes compact discs (CD), laser discs,optical discs, digital versatile discs (DVD), floppy disks and blu-raydiscs. Generally, disks reproduce data magnetically, and discs reproducedata optically. Combinations of the above should also be included withinthe scope of computer-readable media.

In addition to storage on computer-readable medium, instructions and/ordata may be provided as signals on transmission media included in acommunication apparatus. For example, a communication apparatus mayinclude a transceiver having signals indicative of instructions anddata. The instructions and data are configured to cause one or moreprocessors to implement the functions outlined in the claims.

While the embodiments described herein have been described withreference to numerous specific details, one of ordinary skill in the artwill recognize that the embodiments can be embodied in other specificforms without departing from the spirit of the embodiments. Thus, one ofordinary skill in the art would understand that the embodimentsdescribed herein are not to be limited by the foregoing illustrativedetails or interrelationships between embodiments, but rather are to bedefined by the appended claims.

Although the present disclosure and its advantages have been describedin detail, it should be understood that various changes, substitutionsand alterations can be made herein without departing from the spirit andscope of the disclosure as defined by the appended claims. For example,although certain examples above describe either relational ornon-relational database systems, aspects of the disclosure above may beapplied to either relational or non-relational database systems. Thus,description of “records” of a relational database also apply to “values”of a non-relational database, and description of “tables” of arelational database also apply to “files” of a non-relational database.Moreover, the scope of the present application is not intended to belimited to the particular embodiments of the process, machine,manufacture, composition of matter, means, methods and steps describedin the specification. As one of ordinary skill in the art will readilyappreciate from the present invention, disclosure, machines,manufacture, compositions of matter, means, methods, or steps, presentlyexisting or later to be developed that perform substantially the samefunction or achieve substantially the same result as the correspondingembodiments described herein may be utilized according to the presentdisclosure. Accordingly, the appended claims are intended to includewithin their scope such processes, machines, manufacture, compositionsof matter, means, methods, or steps.

What is claimed is:
 1. A method for evaluating analytic queriescomprising disjunctive Boolean expressions, comprising: receiving, witha processor, an analytic query comprising a first disjunctive Booleanexpression; transforming, with the processor, the analytic query toobtain a transformed analytic query comprising at least onenondisjunctive Boolean expression and at least a second disjunctiveBoolean expression; and evaluating, with the processor, the transformedanalytic query, wherein complete evaluation of the at least onenondisjunctive Boolean expressions and the at least a second disjunctiveBoolean expressions yields the same results as evaluation of the firstdisjunctive Boolean expression, wherein evaluation of the at least onenondisjunctive Boolean expression comprises retrieving and processing ofa single data table referenced by the first disjunctive Booleanexpression of the analytic query, and wherein no processing is performedon the data table when the data table does not satisfy a criteria of thenondisjunctive Boolean expression.
 2. The method of claim 1, wherein thereceived analytic query specifies a plurality of data tables comprisingone or more records to be evaluated in accordance with the firstdisjunctive Boolean expression.
 3. The method of claim 2, wherein thefirst disjunctive Boolean expression requires the processing of at leasttwo data tables of the plurality of data tables to evaluate the firstdisjunctive Boolean expression.
 4. The method of claim 2, whereinevaluating the transformed analytic query comprises: evaluating each ofthe at least one nondisjunctive Boolean expressions, wherein evaluationof a nondisjunctive Boolean expression requires the processing of only asingle data table of the plurality of data tables; filtering theplurality of data tables to obtain a subset of the plurality of datatables, wherein the subset of data tables comprises data tablesprocessed for the evaluation of each of the at least one nondisjunctiveBoolean expressions and for which the evaluation of the data tablesyielded a first Boolean value; and evaluating each of the at least asecond disjunctive Boolean expressions, wherein evaluation of each ofthe at least a second disjunctive Boolean expressions requires theprocessing of at least two data tables from the subset of data tables.5. The method of claim 4, wherein evaluating each of the at least onenondisjunctive Boolean expressions comprises evaluating with a firstsubprocessor of the processor, and wherein evaluating each of the atleast a second disjunctive Boolean expressions comprises evaluating witha second subprocessor of the processor.
 6. The method of claim 5,further comprising transferring the subset of data tables from the firstsubprocessor to the second subprocessor.
 7. A computer program product,comprising: a non-transitory computer-readable medium comprisinginstructions which, when executed by a processor of a computing system,cause the processor to perform the steps of: receiving an analytic querycomprising a first disjunctive Boolean expression; transforming theanalytic query to obtain a transformed analytic query comprising atleast one nondisjunctive Boolean expression and at least a seconddisjunctive Boolean expression; and evaluating the transformed analyticquery, wherein complete evaluation of the at least one nondisjunctiveBoolean expressions and the at least a second disjunctive Booleanexpressions yields the same results as evaluation of the firstdisjunctive Boolean expression, wherein evaluation of the at least onenondisjunctive Boolean expression comprises retrieving and processing ofa single data table referenced by the first disjunctive Booleanexpression of the analytic query, and wherein no processing is performedon the data table when the data table does not satisfy a criteria of thenondisjunctive Boolean expression.
 8. The computer program product ofclaim 7, wherein the received analytic query specifies a plurality ofdata tables comprising one or more records to be evaluated in accordancewith the first disjunctive Boolean expression.
 9. The computer programproduct of claim 8, wherein the first disjunctive Boolean expressionrequires the processing of at least two data tables of the plurality ofdata tables to evaluate the first disjunctive Boolean expression. 10.The computer program product of claim 8, wherein the medium furthercomprises instructions to cause the processor to perform the steps of:evaluating each of the at least one nondisjunctive Boolean expressions,wherein evaluation of a nondisjunctive Boolean expression requires theprocessing of only a single data table of the plurality of data tables;filtering the plurality of data tables to obtain a subset of theplurality of data tables, wherein the subset of data tables comprisesdata tables processed for the evaluation of each of the at least onenondisjunctive Boolean expressions and for which the evaluation of thedata tables yielded a first Boolean value; and evaluating each of the atleast a second disjunctive Boolean expressions, wherein evaluation ofeach of the at least a second disjunctive Boolean expressions requiresthe processing of at least two data tables from the subset of datatables.
 11. The computer program product of claim 10, wherein evaluatingeach of the at least one nondisjunctive Boolean expressions comprisesevaluating with a first subprocessor of the processor, and whereinevaluating each of the at least a second disjunctive Boolean expressionscomprises evaluating with a second subprocessor of the processor. 12.The computer program product of claim 11, wherein the medium furthercomprises instructions to cause the processor to perform the step oftransferring the subset of data tables from the first subprocessor tothe second subprocessor.
 13. An apparatus, comprising: a memory; and aprocessor coupled to the memory, the processor configured to execute thesteps of: receiving an analytic query comprising a first disjunctiveBoolean expression; transforming the analytic query to obtain atransformed analytic query comprising at least one nondisjunctiveBoolean expression and at least a second disjunctive Boolean expression;and evaluating the transformed analytic query, wherein completeevaluation of the at least one nondisjunctive Boolean expressions andthe at least a second disjunctive Boolean expressions yields the sameresults as evaluation of the first disjunctive Boolean expression,wherein evaluation of the at least one nondisjunctive Boolean expressioncomprises retrieving and processing of a single data table referenced bythe first disjunctive Boolean expression of the analytic query, andwherein no processing is performed on the data table when the data tabledoes not satisfy a criteria of the nondisjunctive Boolean expression.14. The apparatus of claim 13, wherein the received analytic query,specifies a plurality of data tables comprising one or more records tobe evaluated in accordance with the first disjunctive Booleanexpression.
 15. The apparatus of claim 14, wherein the first disjunctiveBoolean expression requires the processing of at least two data tablesof the plurality of data tables to evaluate the first disjunctiveBoolean expression.
 16. The apparatus of claim 14, wherein the processoris further configured to perform the steps of: evaluating each of the atleast one nondisjunctive Boolean expressions, wherein evaluation of anondisjunctive Boolean expression requires the processing of only asingle data table of the plurality of data tables; filtering theplurality of data tables to obtain a subset of the plurality of datatables, wherein the subset of data tables comprises data tablesprocessed for the evaluation of each of the at least one nondisjunctiveBoolean expressions and for which the evaluation of the data tablesyielded a first Boolean value; and evaluating each of the at least asecond disjunctive Boolean expressions, wherein evaluation of each ofthe at least a second disjunctive Boolean expressions requires theprocessing of at least two data tables from the subset of data tables.17. The apparatus of claim 16, wherein evaluating each of the at leastone nondisjunctive Boolean expressions comprises evaluating with a firstsubprocessor of the processor, and wherein evaluating each of the atleast a second disjunctive Boolean expressions comprises evaluating witha second subprocessor of the processor.
 18. The apparatus of claim 17,wherein the subset of data tables are transferred from the firstsubprocessor to the second subprocessor.